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The concept of breakdown point was introduced by Hampel [Ph.D. dis- 
sertation (1968), Univ. California, Berkeley; Ann. Math. Statist. 42 
(1971) 1887-1896] and developed further by, among others, Huber 
[Robust Statistics (1981). Wiley, New York] and Donoho and Huber 
[In A Festschrift for Erich L. Lehmann (1983) 157-184. Wadsworth, 
Belmont, CA]. It has proved most successful in the context of loca- 
tion, scale and regression problems. Attempts to extend the concept 
to other situations have not met with general acceptance. In this 
paper we argue that this is connected to the fact that in the loca- 
tion, scale and regression problems the translation and afhne groups 
give rise to a definition of equivariance for statistical functionals. 
Comparisons in terms of breakdown points seem only useful when 
restricted to equivariant functionals and even here the connection 
between breakdown and equivariance is a tenuous one. 

1. Introduction. 

1.1. Contents. In Section 1 we give a short overview of the concepts of 
breakdown and equivariance and a brief discussion of previous work. Sec- 
tion 2 contains notation and the standard definition of breakdown and in 
Section 3 we derive an upper bound for the breakdown points of equivari- 
ant statistical functionals. Section 4 contains some old and new examples in 
light of the results of Section 3. The attainability of the bound is discussed 
in Section 5 and finally in Section 6 we argue that the connection between 
breakdown and equivariance is fragile. 
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1.2. Breakdown points and equivariance. The notion of breakdown point 
was introduced by Hampel (1968, 1971). Huber (1981) took a functional an- 
alytical approach; a simplified version for finite samples was introduced by 
Donoho (1982) and Donoho and Huber (1983). To be of practical use a 
definition of breakdown should be simple, reflect behavior for finite sam- 
ples and allow comparisons between relevant statistical functionals. With 
some proviso (see Section 6) these goals have been achieved for location, 
scale and regression problems in M'^ [see, e.g., Hampel (1975), Rousseeuw 
(1984, 1985), Lopuhaa and Rousseeuw (1991), Davies (1993), Stahel (1981), 
Donoho (1982), Tyler (1994) and Gather and Hilker (1997)] and for related 
problems [see, e.g., Ellis and Morgenthaler (1992), Davies and Gather (1993), 
Becker and Gather (1999), Hubert (1997), Terbeck and Davies (1998), He 
and Fung (2000) and Miiller and Uhlig (2001)]. This success has led many 
authors to develop definitions applicable in other situations. We mention 
nonlinear regression [Stromberg and Ruppert (1992)], time series [Martin 
and Jong (1977), Papantoni-Kazakos (1984), Tatum and Hurvich (1993), 
Lucas (1997), Mendes (2000), Ma and Genton (2000) and Genton (2003)], ra- 
dial data [He and Simpson (1992)], the binomial distribution [Ruckstuhl and 
Welsh (2001)] and more general situations as in Sakata and White (1995), He 
and Simpson (1993) and Genton and Lucas (2003). An essential component 
of the theory of high breakdown location, scale and regression functionals 
is the idea of equivariance. With the exception of He and Simpson (1993), 
none of the above generalizations of breakdown point incorporates a concept 
of equivariance. It is as if the equivariance part has been relegated to the 
small print and then forgotten [see 't Hooft (1997) for the role of the small 
print in physics]. The main purpose of this paper is to emphasize the role of 
a group structure, to give some new examples and to point out the fragility 
of the connection. 

2. A definition of breakdown point. We consider a measurable sample 
space {X, B{X)) and the family V of all nondegenerate probability measures 
on this space. We assume that a pseudometric d is defined on V which 
satisfies 

(2.1) sup d{P,Q) = l 

P,Qev 

and for all P, Qi, Q2 and a, < a < 1, 

(2.2) d{aP + {1 - a)Qi,aP + {1 - a)Q2) <l-a. 

We consider functionals T which map V into a parameter space Q which is 
equipped with a pseudometric D on Q x Q satisfying 



(2.3) 



sup D{6i,92) = 00. 

01,62 
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The breakdown point e*{T,P,d,D) of the functional T at the distribution 
P with respect to the pseudometrics d and D is defined by 

(2.4) e*{T,P,d,D) = mf(e>0: sup D{T{P),T{Q)) = oo] . 

I d(P,Q)<e ) 

The finite-sample replacement breakdown point of a functional T is defined 
as follows. If x„ = (xi, . . . ,Xn) is a sample of size n, we denote its empirical 
distribution by Pn = J2i^=i ^Xi/n. Let yn,k be a sample obtained from x„ by 
altering at most k of the Xj and denote the empirical distribution of yn,k 
by Qn,k- The finite-sample breakdown point (fsbp) of T at the sample x„ 
(or Pn) is then defined by [see Donoho and Huber (1983)] 

(2.5) fsbp(r, x„, D) = - mm{k G {1, . . . , n} : sup D{T{Pn),T{Qn.k)) = oo j. 



3. Groups and equivariance. 



3.1. An upper hound for the breakdown point. Let G be a group of mea- 
surable transformations g oi X onto itself with unit element i. For any P £V 
and any g £ G we define P^ by P3{B) = P{g~^{B)). The group G induces 
a group Hg = {hg : g & G} of transformations hg : Q ^ Q and a functional 
T : "P — > is called equivariant with respect to G if 

(3.1) T{P3) = hg{T{P)) for ah 9 G G, P e 7^. 

We set 

(3.2) 

The restriction 
we define 

(3.3) A(P)=sup{P(5):SG5,c/|B = t|B for some 5 G Gi}. 

The functional A(P) appears explicitly in the expression for the highest 
possible breakdown point. We give two examples. If G is the translation 
group on R'^, then the defining set in (3.3) is empty so that A(P) = 0. For 
affine transformations Ax + h = x for x £ B and consequently A(P) is the 
greatest measure of a lower-dimensional hyperplane. 

Theorem 3.1. With the above notation and under the assumption that 
Gi ^ we have 

(3.4) e*iT,P,d,D)<{l-A{P))/2 

for all G-equivariant functionals T , for all P G V, for all pseudometrics 
d and D satisfying (2.1)-(2.3). 



Gi = i^geG: Ihn^mi D{e, hg,^ {6)) = 00^ 
of g £ G to a set B £ B will be denoted by g^B- Given this 
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Proof. Let -Bo and 5 G Gi be such that g^^^ = l^^^. Consider the mea- 
sures defined by Qi{B) = P{BnBo), Q2{B) = P{B) - Qi{B) and Qn{B) = 

{Q2{B) + Qf {B))/2 + Qi{B) for BeB.KsQ{ = Q\' = Qi we have Q^"" = 

iQf" + Q2)/2 + Qi and on using (2.2) it follows that d{Qf",P) < (1 - 
P{Bq))/2 and d{Qn,P) < (1 - P{Bo))/2. Clearly 

z)(r(Qr),nQn))<i)(T(p),T(Qr))+^m^'),nQn))- 



The claim of the theorem follows. □ 

Theorem 3.2. With the above notation and under the assumption Gi 7^ 
we have 



Proof. The proof follows the lines of the proof of Theorem 3.1. For the 
details we refer to Davies and Gather (2002). □ 

4. Examples. 

4.1. Location junctionals and the translation group. We take X to be 
fc-dimensional Euclidean space M'^ and G the translation group. The param- 
eter space is M*^ and the group Hq is again the translation group. The 
pseudometric Z) on B is the Euclidean metric. Any pseudometric d which 
satisfies (2.1) and (2.2) will suffice. This applies for all other examples so 
we no longer specify d. As mentioned just after (3.3), we have A(P) = for 
all P and Theorem 3.1 now states that £*(T,P,d,D) < 1/2 for any transla- 
tion equivariant functional. 

4.2. Scatter Junctionals and the affine group. X is /c-dimensional Eu- 
clidean space and G is the affine group, the parameter space is the 
space Sfc of nonsingular symmetric {k x A:)-matrices and the elements kg of 
Hg are defined by 



The definition of Gi implies 

Jim(I)(T(P),r(Qf")) + Z?(r(P),T(Q„))) = oo 

and we deduce that for any e > (1 — P{Bq))/2 

sup D{T{P),T{Q)) = 00. 



d(P,Q)<e 



(3.5) 




(4.1) 



hg{a) = AaA\ 
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where g{x) = Ax + b. The pseudometric on is given by 

(4.2) Z)(ai,cJ2) = |log(det(aia2"^))|, ai,(J2GSfc 

and hence Gi = {g:g{x) = Ax + a,det{A) / 1}. Wehave A(P) =sup{P{B):B 
is a hyperplane of dimension <k—l} and Theorem 3.1 is now Theorem 3.2 
of Davies (1993). 

4.3. Regression functionals and the translation group. X is now {k + 1)- 
dimensional Euchdean space M'^ x M, where the first k components define 
the design points and the last component is the corresponding value of y. 
The group G consists of all transformations 

(4.3) g{{x\y)') = {x\y + x'a)\ (x*, y)* G M'^ x M, 

with a G M^. The space is M'"' and a functional T : "P — > is equivariant 
with respect to the group if T{P^) = T{P) — a. The arguments go through 
as in Section 4.2 and the result is Theorem 3.1 of Davies (1993). 

4.4. Time series and realizable linear filters. We denote the space of 
doubly infinite series of complex numbers by and define 

(4.4) = = e : ^ k„-j|(l + 5)-^ < oo for all n G z| 

for some 6 > and equip X with the usual Borel cj-algebra. Define the group 
Gby 

(4.5) G = < g : g -.Ti^s ^ C, analytic and bounded with inf |^(z)|>0>, 

where Tr denotes the open disc in C of radius r and e > 6. Each such 
g £ G has a power series expansion g{z) = Y^JLodj^'' defines a linear 
filter g on X ^ 

oo 

(4.6) {g{x))n = ^Xn-jgj, neZ. 

j=0 

The linear filters g form the group G. The parameter space is the space 
of finite distribution functions F on (— 7r,7r]. For F £ Q and g (z G we de- 
fine hg{F) by 

(4.7) hy{F) = Fg where dFg(A) = |c/(exp(iA))|^dF(A). 
Finally, the pseudometric D on is defined by 

(4.8) D{F,,F,) = I £|^°g(S) I " 

I oo, otherwise. 
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where Fi x F2 means that the two measures are absokitely continuous with 
respect to each other. The conditions placed on the group G imply that 

inf |ff(exp(iA))| >0, dF^/dF = |5(exp(fA))|' 

Ae(-7r,7rJ 

and 

D{F, hg{F)) = 2 r |log(<7(exp(iA)))| dX 

J —TT 

for any F in and g & G. This implies 

D{F,hg^iF)) = 2n r |log(5(exp(zA)))|(iA 

J —TT 

and hence 

l"K 

lim n / |log((7(exp(iA)))| dA = cxo 



n— >oo 



unless |5((exp(iA))| = 1,— vr < A < vr. This, however, would imply g{z) = z 
and so we see that Gi / 0. Theorem 3.1 gives 

e*(r,P,d,Z))<(l-A(P))/2. 

In the present situation the definition (3.3) of A(P) reduces to 

(4.9) A(P) = supj P(B) : 5 = I X : x„ = £ ,nGz|,ffGGi|, 

which is effectively the maximum probability that x is deterministic. If P is a 
stationary Gaussian measure with spectral distribution F whose absolutely 
continuous part has density fac, then the Szego (1920) alternative is A(P) = 
or 1 according to whether 



/TT 
log(/ac(A)) dA > or = -00. 
-TT 



4.5. The Michaelis-Menten model. The Michaelis-Menten model may 
be parameterized as 

CLX 

(4.10) y = -rr + ^^ a,c,xGM+ = (0,cx)) 

cx + l/a 

with 9 = {a,c). X is M+ x M and the elements g ofG are defined by g{{x, y)) = {ax, y) 
with a > 0. The elements hg of the induced group are given by hg{6) = 
{aj ^fa^cj \fa\ We take the metric D to be given by 

F>{0i,92) = \ai - 02! + la^-^ - a^"^| + |ci - C2I. 

As g{{x,y)) = {x,y) only for 17 = t we see that Gi ^ and that A(P) = 
0. This implies a highest finite-sample breakdown point of [(?t- + l)/2j/?i, 
which is clearly attainable. Extensions to the real linear fractional group are 
possible. 
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4.6. Logistic regression I. Logistic regression is a binomial model with 
covariates. For the binomial distribution itself it has been shown by Ruck- 
stuhl and Welsh (2001) that a breakdown point of 1 is attainable by function- 
als which are equivariant with respect to the two-element group G = {L,g} 
where g{x) = 1 — x and hg(p) = 1 — p. As pointed out by Peter Rousseeuw 
(comment at the ICORS 2002 meeting in Vancouver), this is the natural 
group for the binomial distribution. The logistic regression model is 

P(Y = l\x) = exp{9o + x^e)/{l + exp(^o + x*6)), 

(4.11) 

where = (xi, . . . ,Xk) are the covariates associated with the random vari- 
able Y. The sample space is X = {0, 1} x M'^ and the parameter space @ is 
j^fc+i rjij^g group G is generated by the composition of transformations of 
the form 

(4.12) 

(4.13) {y,xy^{y,A{xY)\ 

where ^ is a nonsingular affine transformation A{x) = Ax + a. The group Hq 
of transformations of Q induced by G is given by 

(4.14) hg{e) = -9, 5 as in (4.12), 

(4.15) hg{{9o,9'Y) = {eo-a\Ay6,{{AY\9))y, g as in (4.13). 

The metric D on Q is taken to be the Euclidean metric. All the conditions 
for Theorem 3.1 are satisfied except that Gi = and indeed the constant 
functional T{P) = for all P is equivariant with breakdown point 1. If the 
constant functional is not thought to be legitimate, an alternative one is the 
following. For e > we define T by 



exp(0o + X 

y 



1 + exp{9o + xH 



T{P) = arg min / 

(4.16) 

+ £{9Q + x'ef 



dP{x,y). 



The additional term is a form of regularization which prevents explosion in 
the case where the sets of x^s with y = 1 and with ?/ = are separated by 
a hyperplane. The functional T is equivariant. Consider a data set which 
is such that any set of {k + l)-vectors (1, a^j-.)*, i = 1, . . . , A; -|- 1, is linearly 
independent. On denoting the empirical distribution of a replacement sample 
by P* we note that T[P*) remains bounded for all replacement samples 
which contain at least k + lol the original sample's values. The finite-sample 
breakdown point is therefore 1 — k/n. 
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4.7. Logistic regression II. We consider the growth model 



(4.17) Y{t) = exp(a + bt)/{l + exp(a + bt)) + e{t), 

which has an obvious equivariance structure. We define ip{y) by 



ip{y) = max{0, min{l, y}} 

and a functional T by 



T(P) = arg min / (V'(y) - exp(a + bt)/{l + exp(a + bt))f dP{y, t). 



a,b 



Given a data set (y(tj), tj), i = 1, . . . , n, we see that T will only break down if 
there exists a t such that y{ti) = for all ti <t and y{ti) = 1 for all t j > t or 
vice versa. From this it follows that in general the finite-sample breakdown 
point will be 1 — 1/n. This is much higher than the breakdown point of the 
LMS functional, which is about 1/2 [see Stromberg and Ruppert (1992), 
Section 5]. 

5. Attaining the bound. 

5.1. Location Junctionals. The translation equivariant Li-functional 



attains the bound of 1/2 of Section 4.1. It is not affine equivariant and at- 
tempts to prove the bound of 1 /2 for affine equivariant functionals in M'^ with 
k>2 have not been successful [Niinimaa, Oja and Tableman (1990), Lop- 
uhaa and Rousseeuw (1991), Gordaliza (1991), Lopuhaa (1992) and Donoho 
and Gasko (1992)]. The proof of Theorem 3.1 also fails for the affine group 
as Gi = 0. That a bound of 1/2 does not hold globally is shown by the 
example X = 1^? with point mass 1/3 on the points xi = (0, 1), X2 = (0, — 1), 
3^3 = ('7^3,0). More generally, in k dimensions there are samples for which 
l/{k + 1) is the maximal breakdown point. In spite of this, there are samples 
where a finite-sample breakdown point of 1/2 is attainable. The construction 
is somewhat complicated and may be found in Davies and Gather (2002). 

5.2. Scatter functionals. The median absolute deviation (MAD) has a 
finite-sample breakdown point of max(0, 1/2 — A(P„)), which is less than 
the upper bound of Theorem 3.2. We propose a modification of the MAD 
which does attain the upper bound. For a probability measure P we define 
the interval I{P, A) by I{P, A) = [med(P) — A, med(P) -|- A] and write 



(5.1) 




A(P, A) = max{P({x}) : x G I{P, A)}. 
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The new scale functional MAD* is defined by 

MAD*(P) = min{A:P(/(P,A)) > (1 + A(P, A))/2}, 

which can easily be calculated. It achieves the upper bound of Theorem 3.2. 
The breakdown point in terms of metrics depends on the metric used [see 
Huber (1981), page 110]. For the Kuiper metric based on one interval the 
breakdown point is (1 — A(P))/3 [see also Davies (1993)] while for the Kuiper 
metric based on three intervals it is (1 — A(P))/2 [see Davies and Gather 
(2002)]. 

6. Final remarks. As mentioned in the Introduction the definition of 
breakdown point should meet the following three goals: it should be sim- 
ple, it should reflect the behavior of statistical functionals for finite samples 
and it should allow useful comparisons between statistical functionals. We 
examine these demands more closely for the case of a location functional 
in M. The definition of breakdown point (2.4) involves a limiting operation 
and this is an essential part of its simplicity. If oo in (2.4) were replaced 
by some large number the simplicity would be lost. The simplification re- 
sulting from the limiting operation will only be successful if the resulting 
definition reflects the behavior for finite samples. The situation is analogous 
to the limiting operation of differentiation which reflects the behavior of the 
function for small but finite values. The breakdown points of 1/n for the 
mean and 1/2 for the median do reflect their finite-sample behavior. As the 
median is translation equivariant and the highest breakdown point for such 
functionals is 1/2, we seem to have achieved all three goals. If no restric- 
tions were imposed on the class of allowable functionals, then breakdown 
points of 1 become attainable. We know of no situation not based on equiv- 
ariance considerations where it can be shown that the highest breakdown 
point for a class of reasonable functionals is less than 1 . A referee suggested 
the following example: estimate b in the model E{y\x) = bx from 2m points 
at X = and another m points at x = 1 where the conditional distribution 
of y given x is normal with mean zero and variance 1. The problem is to 
construct a consistent estimator with a breakdown point of more than 1/3. 
We construct one with breakdown point 1. We give a finite-sample version. 
The data points are (xi,yi), . . . , (x„,y„,) with empirical distribution P„. If 
the all equal we put T{Pn) = 0. Otherwise we set 

(6.1) T{Pn) =max{-n,min{n,rLs(f„)}}, 

where Tls is the least squares estimator through the origin. As |T(P„,)| is 
bounded by n for any empirical distribution it has finite-sample break- 
down point 1. On the other hand it is consistent. Equivariance considerations 
prohibit such a construction. In certain situations location functionals which 



10 



p. L. DAVIES AND U. GATHER 



are not translation equivariant may be preferred. If, for example, there is 
prior knowledge about the range of possible values of the location, then this 
can be exploited to give a breakdown point of 1. In all the situations we have 
considered where a breakdown point of 1 is attainable, it has proved to be 
quite easy to produce a perfectly sensible functional which attains or almost 
attains a breakdown point of 1. If this had been the case for equivariant 
functionals, we suspect that not so much research would have been devoted 
to the problem of high breakdown functionals. The breakdown point of 1/2 
for the median reflects its behavior at the following samples: 



In both cases as A tends to infinity the median breaks down in spite of 
the fact that the proof of Theorem 3.2 only covers the behavior at sam- 
ple (6.2). Indeed any translation equivariant functional will break down at 
sample (6.2) but it is easy to define translation equivariant functionals which 
do not break down at sample (6.3). Although a functional which does not 
break down at (6.3) may seem artificial, there are quite plausible situations 
where a similar phenomenon occurs. The noise may be simple white noise 
and the signal a very small subset of the data which lies very close to a 
straight line. It may well be possible to find this subset in spite of 99% of 
the data being noise and moreover, this may be accomplished in an equiv- 
ariant manner. The behavior of the median at sample (6.3) is not explained 
by its translation equivariance and its breakdown point of 1/2. The median 
must have some other, as yet unspecified, property beyond equivariance 
which makes the breakdown point of 1/2 a good description of its behavior. 
Thus even in the case of equivariance the success of the concept of break- 
down point would seem to be more fragile than is generally supposed. It is 
perhaps a case of invisible small print. 

Acknowledgments. We acknowledge the work of two referees and an As- 
sociate Editor whose comments on the two versions of this paper led to 
a number of improvements in content and style. 
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